Prognostic DNA mutation and mRNA expression analysis of perineural invasion in oral squamous cell carcinoma

This study analyzed oral squamous cell carcinoma (OSCC) genomes and transcriptomes in relation to perineural invasion (PNI) and prognosis using Cancer Genome Atlas data and validated these results with GSE41613 data. Gene set enrichment analysis (GSEA), gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes were conducted. We identified 22 DNA mutations associated with both overall survival (OS) and PNI. Among them, TGFBR1 and RPS6KA4 mRNAs were overexpressed, while TYRO3 and GPR137 mRNAs were underexpressed in PNI patients. Among the 141 mRNA genes associated with both OS and PNI, we found overlap with PNI-related DNA mutations, including ZNF43, TEX10, TPSD1, and PSD3. In GSE41613 data, TGFBR1, RPS6KA4, TYRO3, GPR137, TEX10 and TPSD1 mRNAs were expressed differently according to the prognosis. The 22 DNA-mutated genes clustered into nervous system development, regulation of DNA-templated transcription, and transforming growth factor beta binding. GSEA analysis of mRNAs revealed upregulation of hallmarks epithelial mesenchymal transition (EMT), TNFα signaling via NF-κB, and IL2 STAT5 signaling. EMT upregulation aligned with the TGFBR1 DNA mutation, supporting its significance in PNI. These findings suggest a potential role of PNI genes in the prognosis of OSCC, providing insights for diagnosis and treatment of OSCC.


Patient population
Data of patients with OSCC who had undergone surgical resection were obtained from the Cancer Genome Atlas (TCGA) Portal.Patients with tumors located in the pharynx, hypopharynx, larynx, and oropharynx were excluded from the study.Among the initial 320 OSCC patients, 15 patients with a prior malignancy, 21 patients with an unknown stage, and 56 patients with unknown perineural invasion status were excluded.This left a total of 228 patients for analysis.Whole exome somatic mutation data from 224 patients and mRNA sequencing data from 218 patients were used in the study.The Genomic Data Commons' (GDC) recommended clinical elements and survival outcome data was utilized 12 .Additionally, GSE41613 (https:// www.ncbi.nlm.nih.gov/ geo/ query/ acc.cgi? acc= GSE41 613) served as a validation set in the gene expression omnibus (GEO) datasets.Out of the 97 HPV-negative OSCC patients, 96 were analyzed and verified, excluding 1 patient with an unknown treatment status.This study adhered to the GDC Data Use Agreement and the study-specific Data Use Certification Agreement available in the database of Genotypes and Phenotypes (dbGaP).

Somatic mutation and mRNA sequencing data
Whole exome somatic mutation and mRNA sequencing (RNA-Seq) data, released on March 29, 2022, were downloaded from the TCGA data portal.Somatic variants were obtained in the mutation annotation format (MAF).Whole exome sequencing was performed using the Illumina Genome Analyzer IIX platform.Loss-offunction mutations were defined as nonsense mutations, frameshift indels, and splice-site mutations.RNA-Seq by Expectation Maximization (RSEM) data were generated using the Illumina HiSeq 2000 RNA-Seq platform (version 2), and normalized RSEM data were used for estimating mRNA expression.

Statistical analysis
Univariate and multivariate Cox regression analyses were conducted to estimate Hazard Ratios (HRs) and their corresponding 95% confidence intervals (95% CIs) for overall survival after adjusting for all major clinical variables.The underlying assumptions of the Cox models were verified through regression diagnostics using Schoenfeld and df beta residuals.Survival analysis was performed using the Kaplan-Meier method, and overall survival was compared between two groups using the log-rank test.Fisher's exact tests were employed to investigate differentially mutated genes.For each gene, a 2 × 2 contingency table of mutation frequencies was constructed for different stage groups, and Fisher's exact test was used to identify genes with significant differences in mutation frequency.Furthermore, differential mRNA expression was assessed through DESeq2 analyses of differentially expressed genes (DEGs) using transcripts per million mapped reads (TPM) values.DESeq2 analyzes differential expression based on a model using the negative binomial distribution.All statistical analyses were performed using R software (version 4.2.2).

Gene set enrichment and over-representation analysis
Gene Set Enrichment Analysis (GSEA) was conducted based on the presence or absence of PNI in OSCC patients.GSEA calculates the enrichment score (ES) by analyzing the ranked list of genes, incrementing a running-sum statistic when a gene is in the gene set and decrementing it when it is not.The ES reflects the degree to which a gene set is overrepresented at the top or bottom of the ranked list of genes.
The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a database resource used to understand high-level functions and utilities of biological systems.The KEGG mapper online tool was employed to perform KEGG analysis.Additionally, the Database for Annotation, Visualization, and Integrated Discovery (DAVID) from the National Institutes of Health in Bethesda, MA, was used for Gene Ontology (GO) enrichment analysis.

Patients
In this study, we analyzed 224 patients from TCGA dataset and 96 patients from GEO dataset with primary OSCC were analyzed (Table S1 and S2).The TCGA patient cohort had an average age of 61.7 years (range, 19-90 years), consisting of 153 males and 71 females.Among the 224 TCGA patients, 103 had passed away, and the median survival time was 36.4 months.The mean follow-up period for the surviving patients was 36 months.Diseasespecific and overall survival analyses were performed using the GEO dataset as the validation set, revealing that 29 and 50 out of the 96 patients died, respectively.Multivariate Cox proportional hazard regression analysis was conducted on all significant clinical variables.Stage, perineural Invasion (PNI), lymphovascular invasion, and treatment modality all yielded statistically significant results (Table S3).
Among these genes, GPAM also showed significant prognostic relevance in univariate Cox analysis of TCGA mRNA dataset (p = 0.03; HR = 1.5).In the additional GEO RNA validation set, CNPPD1 and FLI1 exhibited significant expression in disease-specific and overall Cox analyses, respectively (p = 0.02 and 0.03; Table 2).ZNF554, RAB2A, and SORCS2 showed borderline prognostic significance in this validation set (Table 2).These findings indicate the potential prognostic importance of these genes in OSCC, particularly in the context of PNI.
Among the 22 DNA mutated genes with prognostic significance (Fig. 1 and Table S3), the mRNAs of TGFBR1 and RPS6KA4 were significantly overexpressed in PNI patients (log2FC = 0.27 and 0.20; p = 0.002 and 0.026, respectively; Fig. 3).Conversely, the mRNAs of TYRO3 and GPR137 were significantly underexpressed (log2FC = -0.21and -0.14; p = 0.045 and 0.067, respectively; Fig. 3).Similar results were observed in additional GEO RNA validation sets.TGFBR1 and RPS6KA4 maintained prognostically significant expressions with HRs greater than 1 in multivariate Cox models for disease-specific survival (p = 0.02 and 0.002; Table 2).Conversely, TYRO3 and GPR137 demonstrated prognostically significant expressions with HRs less than 1 in disease-specific and overall Cox analyses, respectively (p = 0.006 and 0.029; Table 2).These findings suggest a potential link between the DNA mutations of these genes and their corresponding mRNA expressions in the context of PNI, highlighting their importance as prognostic indicators in OSCC.
In a previous comparative analysis, a total of 2801 differentially expressed genes were identified (p < 0.05).Out of these, univariate Cox analyses revealed that 245 genes were significantly associated with OSCC prognosis (p < 0.05).GSEA was also performed to identify hallmark pathways significantly associated with these 245 prognostic genes related to PNI (p < 0.05; Fig. 2c).The analysis revealed upregulated hallmark pathways, including epithelial-mesenchymal transition, TNFα signaling via NF-κB, and IL2 STAT5 signaling.Notably, TGFB1, which is related to TGFBR1, was a member of the GSEA hallmark epithelial-mesenchymal transition.GSEA was also performed to identify hallmark pathways significantly associated with these 245 prognostic genes related to PNI (p < 0.05; Fig. 2c).The analysis revealed upregulated hallmark pathways, including epithelial-mesenchymal transition, TNFα signaling via NF-κB, and IL2 STAT5 signaling.Notably, TGFB1, which is related to TGFBR1, was a member of the GSEA hallmark epithelial-mesenchymal transition.
Subsequently, the 245 prognostic significant genes underwent further adjusted for all major clinical variables using multivariate Cox regression.As a result, 141 genes displayed significant outcomes in multivariate Cox analyses.Among these 141 prognostic mRNA genes, TPSD1, ZNF43, TEX10, and PSD3 (Table 3) also showed significant results in DNA frequency comparison analyses (p = 0.005, 0.011, 0.038, and 0.014, respectively).Notably, TEX10 exhibited consistent results in additional GEO RNA validation sets (Table 2).However, PSD3, despite demonstrating prognostically significant expression, showed different HRs, indicating the need for additional research (Table 2).
The 141 prognostic significant genes from the multivariate Cox regression were then subjected to Gene Ontology (GO) and KEGG pathway enrichment analyses (Fig. 4).In the GO enrichment analyses for significant prognostic PNI genes, biological processes, molecular functions, and cellular components were examined Vol:.( 1234567890 www.nature.com/scientificreports/(Fig. 4a-c).Additionally, the significant KEGG pathway hsa00270 cysteine and methionine metabolism (FDR q = 0.035, Fig. 4d) was investigated, and the locations of differentially expressed mRNA genes (p < 0.01; Fig. S1a) were analyzed.

Amino acid alterations and biological pathways in prognostic DNA and mRNA profiles
Within the DNA mutations exhibiting significant differences related to PNI, genes bearing substantial prognostic implications were identified through univariate Cox analyses (Table S3).Among these genes, the amino acid changes of significantly differentially expressed mRNA genes were examined (Fig. 5a-c).Additionally, genes with notable prognostic implications in both univariate and multivariate Cox analyses among the significantly  www.nature.com/scientificreports/differentially expressed mRNA genes associated with PNI were identified.For this subset, we further explored the amino acid changes in genes that exhibited significant differences in DNA mutations linked to PNI (Fig. 5d-f).

Discussion
In the present study, we compared the genomes and transcriptomes of OSCC tumors in relation to PNI and OS and validated this finding using GEO datasets.Among the DNA mutations that were found to be significant in PNI and showed significance in univariate Cox regression, TGFBR1, TYRO3, RPS6KA4, and GPR137 displayed significant differential mRNA expression.Additionally, among the mRNA genes that exhibited significant differential expression in PNI and were significant in multivariate Cox regression, TPSD1, ZNF43, TEX10, and PSD3 demonstrated significant differential DNA mutations.Among them, six genes (TGFBR1, RPS6KA4, TYRO3, GPR137, TEX10 and TPSD1) were validated using mRNA prognostic data.
DEG analysis revealed that prognostic mRNA of PNI were related to hallmarks epithelial mesenchymal transition, TNFα signaling via NF-κB, and IL2 /STAT5 signaling in GSEA analysis and ribonucleoprotein complex biogenesis, chromatin remodeling, protein-DNA complex subunit organization, and cysteine and methionine metabolism etc. in GO and KEGG analysis.Especially, the hallmarks epithelial mesenchymal transition pathway overlapped with the DNA mutation mentioned above, TGFBR1.EMT is a process by which polar epithelial cells are transformed into mesenchymal cells with a significantly enhanced ability to migrate and invade the surrounding tissues and already well known for its relationship with PNI in HNSCC 7,16 .There were strong expression of BDNF, TrkB, and p75NGFR in perineural tumor cells of human cutaneous squamous cell carcinoma 17 , and over-expression of TrkB results in altered expression of molecular mediators of EMT, including downregulation of E-cadherin and upregulation of Twist in HNSCC 18 .Considering TNFα signaling via NF-κB, TNFα promotes oral cancer proliferation, progression, and nociception at least partially by activating Schwann cells 19 .IL2 /STAT5  signaling is a key regulator of CD4 + T cell gene program and known to enrich in low risk group of HNSCC 20,21 .GO and KEGG analysis results were broadly associated with maintenance, translation and mitosis of DNA and RNA.However, key genes and pathways in our study was different from the previous studies.Zhang et al. evaluated the transcriptomes of PNI in HNSCC using weighted gene coexpression network analysis and found 12 genes (TIMP2, MIR198, LAMA4, FAM198B, MIR4649, COL5A1, COL1A2, OLFML2B, MMP2, FBN1, ADAM12, and PDGFRB) were highly expressed in fibroblasts 7 .Otherwise, Saidak et al. identified a specific gene expression profile highly enriched in genes related to muscle differentiation/ function in HNSCC 6 .These differences seemed to have originated from our study's assignment of equal importance to both survival and PNI.PNI is a well-known risk factors in the prognosis of OSCC and we want to identify the key genes that could influence both survival and PNI.We identified eight overlapping genes, TGFBR1, RPS6KA4, TYRO3, GPR137, ZNF43, TEX10 and TPSD1 and PSD3, that exhibited both DNA mutations and differential mRNA expressions regarding PNI.TGFBR1 and TEX10 was a risk gene in DNA survival analysis and overexpressed in PNI cases.Transforming growth factor-β receptor type 1 (TGFBR1) is a receptor for TGF-β ligands that modulates cancer stem cell properties and epithelial-mesenchymal transition.TGFBR1 is a crucial inducer of lung cancer progression, and inhibition of TGFBR1 effectively reduces bone metastasis in prostate cancer 22 .Regarding neurogenesis, TGF-β is a major extracellular signaling molecule in neuronal cells during CNS angiogenesis, mediating the recruitment of endothelial TGFBR1 to regulate endothelial proliferation and cerebrovascular integrity 23 .Testis expressed 10 (Tex10) has a role in transcriptional regulation, ribosome biogenesis, the establishment and maintenance of pluripotency and is known to be upregulated, promoting cancer stem cell properties and therapy resistance in hepatocellular carcinoma, esophageal SCC and urinary bladder carcinoma 24,25 .Ribosomal Protein S6 Kinase A4 for each mRNA, based on the presence of PNI, were validated.TGFBR1, TYRO3, RPS6KA4, and GPR137 not only exhibited significant mRNA expression in cases with PNI but also demonstrated significant prognostic implications in the validation set (Table 2).(RPS6KA4), identified as a non-risk gene in DNA survival analysis and found to be overexpressed in PNI cases, belongs to the ribosomal S6 kinase family.It acts as a downstream effector of MAPK pathway, phosphorylating substrates that are involved in cell growth, proliferation, motility and survival.In a study involving hepatocellular carcinoma patients, those with higher expression levels of RPS6KA4 exhibited a poorer survival 26 .TYRO3 and GPR137 were a risk gene and under-expressed in PNI patients.TYRO3 is a member of the TAM (TYRO3, AXL, and MERTK) family of transmembrane receptor tyrosine kinases, which have been implicated in tumorigenesis, metastasis and therapeutic resistance 27 .Tumors exhibiting high TYRO3 expression often display resistance in patients receiving anti-PD-1/PD-L1 therapy.Inhibition of TYRO3 promotes tumor ferroptosis and sensitizes resistant tumors to anti-PD-1 therapy 28 .Concerning neurogenesis, the tyro3 gene is known to be expressed during central nervous system neurogenesis and displays distinct and highly regionalized patterns of expression in the adult brain 29 .However, TCGA mRNA expression study revealed underexpression in PNI patients and HR lower than 1 in GEO data.This discrepancy warrants further focused study regarding mechanisms.G proteincoupled receptor 137 (GPR137) has been recognized as a key player in carcinogenesis and cancer progression, and its down regulation has been shown to inhibit proliferation and promote apoptosis in leukemia cells.Patients with high GPR137 expression exhibited shorter overall survival time than those with low expression in bladder cancer 30,31 .However, other studies reported a different role for GPR137, where it promotes cell cycle exit and neuronal differentiation in neuro2A cells 32 .Our study's finding of under-expression of GPR137 in PNI patients may be associated with this characteristic.These 5 genes were differentially expressed in GEO data with similar expression pattern.These results indicate a possible association between the DNA mutations of these genes TYRO3, (c) RPS6KA4 depict genes that exhibit substantial prognostic implications and demonstrate significant differences in DNA mutations related to Amino acid alterations in genes with significantly differentially expressed mRNA are illustrated for these genes.Similarly, (d) TEX10, (e) TPSD1, and (f) PSD3 represent genes displaying notable prognostic implications among those with significantly differentially expressed mRNA in relation to PNI.For this subset, the figure illustrates the amino acid changes in genes that exhibited significant differences in DNA mutations.
Vol www.nature.com/scientificreports/and their corresponding mRNA expressions in relation to PNI, underscoring their significance as markers for prognosis in OSCC.ZNF43 and TPSD1 were identified as nonhazardous genes and found to be underexpressed in PNI patients.Zinc finger protein 43 (ZNF43) is known to be involved in transcriptional regulation and is classified as a tumor-suppressor gene.Methylation at specific CpGsites of ZNF43 was known to indicates aggressive behavior 33 .However, our validation did not reveal the significance and further clinical corroboration would be necessary.Tryptase Delta 1 (TPSD1) belongs to a family of trypsin-like serine proteases and is thought to play a role in the regulation of mRNA stability.A specific mutation in TPSD1 was observed in colon cancer patients who did not respond well to chemotherapy 34 .Our GEO RNA validation result were not consistent with TCGA mRNA result with different HR, indicating the need for additional research.Pleckstrin And Sec7 Domain Containing 3 (PSD3) was a risk gene and its expression also affect the patient's survival.PSD3 acted as a potential biomarker predicting relapse of acute myeloid leukemia in cytogenetically normal adult patients and involved in breast cancer metastasis, astrocytoma and papillary thyroid cancer progression 35 .This study has the limitation.While we conducted an analysis by comparing somatic mutation data with mRNA expression data, it is important to acknowledge that the identified genes have only been validated using prognostic data.Nevertheless, our bioinformatics-driven approach in identifying key genes and pathways will provides insights for prospective research endeavors.

Conclusion
In summary, through a comparative analysis of prognosis and PNI between DNA mutation and mRNA expression data of TCGA, we identified eight overlapping genes, TGFBR1, RPS6KA4, TYRO3, GPR137, ZNF43, TEX10, TPSD1, and PSD3, associated with PNI in OSCC.Prognostic DNA mutations clustered as the following terms: nervous system development (ADGRG1, PCDHGC4, TYRO3, TGFBR1), regulation of DNA-templated transcription (RPS6KA4, ZNF43, ZNF699, ZNF554, TGFBR1), and transforming growth factor beta binding (TGFBR1, VASN).Prognostic mRNA expression analysis showed the upregulation of the epithelial mesenchymal transition pathway.Six genes (TGFBR1, RPS6KA4, TYRO3, GPR137, TEX10 and TPSD1) were validated using prognostic data.Considering that PNI serves as an independent risk factor for poor prognosis, our findings provide a valuable perspective for the development of novel treatment strategies tailored to OSCC patients with PNI.

Figure 3 .
Figure 3. Kaplan-Meier survival curves illustrating the 10-year overall survival according to (a) TGFBR1, (b) TYRO3, (c) GPR137, and (d) RPS6A4 mutations in oral squamous cell carcinoma with perineural invasion (PNI) (N = 224).The log2FC values and associated significant p-values of differentially expressed genes (DEGs)for each mRNA, based on the presence of PNI, were validated.TGFBR1, TYRO3, RPS6KA4, and GPR137 not only exhibited significant mRNA expression in cases with PNI but also demonstrated significant prognostic implications in the validation set (Table2).

Figure 4 .
Figure 4. Results of Gene Ontology (GO) enrichment analysis for genes with significant prognostic implications associated with perineural invasion in oral squamous cell carcinoma.The analysis encompasses (a) biological processes, (b) molecular functions, and (c) cellular components.Additionally, (d) KEGG pathway enrichment analysis is provided.These analyses are presented based on their statistical significance.

Figure 5 .
Figure 5. Amino acid alterations of genes with significant prognostic implications across DNA mutations and mRNA expression in oral squamous cell carcinoma patients with perineural invasion.(a) TGFBR1, (b)TYRO3, (c) RPS6KA4 depict genes that exhibit substantial prognostic implications and demonstrate significant differences in DNA mutations related to Amino acid alterations in genes with significantly differentially expressed mRNA are illustrated for these genes.Similarly, (d) TEX10, (e) TPSD1, and (f) PSD3 represent genes displaying notable prognostic implications among those with significantly differentially expressed mRNA in relation to PNI.For this subset, the figure illustrates the amino acid changes in genes that exhibited significant differences in DNA mutations.

Table 1 .
Multivariate Cox regression models of prognostically significant genes in perineural invasion of oral squamous cell carcinoma (The Cancer Genome Atlas patients, N = 224).AJCC American Joint Committee on Cancer, RT radiotherapy, CCRT concurrent chemotherapy and radiotherapy, HR hazard ratio.a The statistical significance of fitted model of Cox proportional hazard ratio was calculated by likelihood ratio test.(p < 0.0001).b Other tumor sites included floor of mouth, buccal mucosa, mandible, maxilla, and lip.c p < 0.05.

Table 2 .
Multivariate Cox regression models of prognostically significant RNA genes in HPV-negative oral squamous cell carcinoma patients (GSE41613 patients, N = 96).AJCC American Joint Committee on Cancer.a The statistical significance of fitted model of Cox proportional hazard ratio was calculated by likelihood ratio test.(p < 0.001).b p < 0.05.

Table 3 .
Multivariate Cox regression models of prognostically significant genes in mRNA expression associated with DNA mutations in perineural invasion of oral squamous cell carcinoma (The Cancer Genome Atlas patients, N = 218).AJCC American Joint Committee on Cancer, RT radiotherapy, CCRT concurrent chemotherapy and radiotherapy, HR hazard ratio.a The statistical significance of fitted model of Cox proportional hazard ratio was calculated by likelihood ratio test.(p < 0.0001).b Other tumor sites included floor of mouth, buccal mucosa, mandible, maxilla, and lip.c p < 0.05.